AITopics | event boundary

Collaborating Authors

event boundary

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hyper-HMM: aligning human brains and semantic features in a common latent event space

Neural Information Processing SystemsFeb-12-2026, 00:05:46 GMT

Here, we propose a hybrid model, the Hyper-HMM, that simultaneously aligns both temporal and spatial features across brains.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > New Hampshire > Grafton County > Hanover (0.05)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Hyper-HMM: aligning human brains and semantic features in a common latent event space

Neural Information Processing SystemsOct-8-2025, 17:35:15 GMT

Here, we propose a hybrid model, the Hyper-HMM, that simultaneously aligns both temporal and spatial features across brains.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > New Hampshire > Grafton County > Hanover (0.05)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Uneven Event Modeling for Partially Relevant Video Retrieval

Zhu, Sa, Chen, Huashan, Zhang, Wanqian, Zhang, Jinchao, Yang, Zexian, Hao, Xiaoshuai, Li, Bo

arXiv.org Artificial IntelligenceJun-4-2025

Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments, wherein event modeling is crucial for partitioning the video into smaller temporal events that partially correspond to the text. Previous methods typically segment videos into a fixed number of equal-length clips, resulting in ambiguous event boundaries. Additionally, they rely on mean pooling to compute event representations, inevitably introducing undesired misalignment. To address these, we propose an Uneven Event Modeling (UEM) framework for PRVR. We first introduce the Progressive-Grouped Video Segmentation (PGVS) module, to iteratively formulate events in light of both temporal dependencies and semantic similarity between consecutive frames, enabling clear event boundaries. Furthermore, we also propose the Context-Aware Event Refinement (CAER) module to refine the event representation conditioned the text's cross-attention. This enables event representations to focus on the most relevant frames for a given text, facilitating more precise text-video alignment. Extensive experiments demonstrate that our method achieves state-of-the-art performance on two PRVR benchmarks. Code is available at https://github.com/Sasa77777779/UEM.git.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2506.00891

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Event Segmentation Applications in Large Language Model Enabled Automated Recall Assessments

Panela, Ryan A., Barnett, Alex J., Barense, Morgan D., Herrmann, Björn

arXiv.org Artificial IntelligenceFeb-18-2025

Understanding how individuals perceive and recall information in their natural environments is critical to understanding potential failures in perception (e.g., sensory loss) and memory (e.g., dementia). Event segmentation, the process of identifying distinct events within dynamic environments, is central to how we perceive, encode, and recall experiences. This cognitive process not only influences moment-to-moment comprehension but also shapes event specific memory. Despite the importance of event segmentation and event memory, current research methodologies rely heavily on human judgements for assessing segmentation patterns and recall ability, which are subjective and time-consuming. A few approaches have been introduced to automate event segmentation and recall scoring, but validity with human responses and ease of implementation require further advancements. To address these concerns, we leverage Large Language Models (LLMs) to automate event segmentation and assess recall, employing chat completion and text-embedding models, respectively. We validated these models against human annotations and determined that LLMs can accurately identify event boundaries, and that human event segmentation is more consistent with LLMs than among humans themselves. Using this framework, we advanced an automated approach for recall assessments which revealed semantic similarity between segmented narrative events and participant recall can estimate recall performance. Our findings demonstrate that LLMs can effectively simulate human segmentation patterns and provide recall evaluations that are a scalable alternative to manual scoring. This research opens novel avenues for studying the intersection between perception, memory, and cognitive impairment using methodologies driven by artificial intelligence.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.13349

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Geng, Tiantian, Zhang, Jinrui, Wang, Qingni, Wang, Teng, Duan, Jinming, Zheng, Feng

arXiv.org Artificial IntelligenceDec-6-2024

Despite impressive advancements in video understanding, most efforts remain limited to coarse-grained or visual-only video tasks. However, real-world videos encompass omni-modal information (vision, audio, and speech) with a series of events forming a cohesive storyline. The lack of multi-modal video data with fine-grained event annotations and the high cost of manual labeling are major obstacles to comprehensive omni-modality video perception. To address this gap, we propose an automatic pipeline consisting of high-quality multi-modal video filtering, semantically coherent omni-modal event boundary detection, and cross-modal correlation-aware event captioning. In this way, we present LongVALE, the first-ever Vision-Audio-Language Event understanding benchmark comprising 105K omni-modal events with precise temporal boundaries and detailed relation-aware captions within 8.4K high-quality long videos. Further, we build a baseline that leverages LongVALE to enable video large language models (LLMs) for omni-modality fine-grained temporal video understanding for the first time. Extensive experiments demonstrate the effectiveness and great potential of LongVALE in advancing comprehensive multi-modal video understanding.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2411.19772

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

Wu, Hao, Liu, Huabin, Qiao, Yu, Sun, Xiao

arXiv.org Artificial IntelligenceApr-3-2024

We present Dive Into the BoundarieS (DIBS), a novel pretraining framework for dense video captioning (DVC), that elaborates on improving the quality of the generated event captions and their associated pseudo event boundaries from unlabeled videos. By leveraging the capabilities of diverse large language models (LLMs), we generate rich DVC-oriented caption candidates and optimize the corresponding pseudo boundaries under several meticulously designed objectives, considering diversity, event-centricity, temporal ordering, and coherence. Moreover, we further introduce a novel online boundary refinement strategy that iteratively improves the quality of pseudo boundaries during training. Comprehensive experiments have been conducted to examine the effectiveness of the proposed technique components. By leveraging a substantial amount of unlabeled video data, such as HowTo100M, we achieve a remarkable advancement on standard DVC datasets like YouCook2 and ActivityNet. We outperform the previous state-of-the-art Vid2Seq across a majority of metrics, achieving this with just 0.4% of the unlabeled video data used for pre-training by Vid2Seq.

boundary, caption, video, (15 more...)

arXiv.org Artificial Intelligence

2404.02755

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

Chen, Sishuo, Li, Lei, Ren, Shuhuai, Gao, Rundong, Liu, Yuanxin, Bi, Xiaohan, Sun, Xu, Hou, Lu

arXiv.org Artificial IntelligenceMar-28-2024

Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of a single auxiliary modality, which is impractical given the diversity and unpredictable nature of real-world scenarios. To this end, we propose a Missing-Resistant framework MR-VPC that effectively harnesses all available auxiliary inputs and maintains resilience even in the absence of certain modalities. Under this framework, we propose the Multimodal VPC (MVPC) architecture integrating video, speech, and event boundary inputs in a unified manner to process various auxiliary inputs. Moreover, to fortify the model against incomplete data, we introduce DropAM, a data augmentation strategy that randomly omits auxiliary inputs, paired with DistillAM, a regularization target that distills knowledge from teacher models trained on modality-complete data, enabling efficient learning in modality-deficient environments. Through exhaustive experimentation on YouCook2 and ActivityNet Captions, MR-VPC has proven to deliver superior performance on modality-complete and modality-missing test data. This work highlights the significance of developing resilient VPC models and paves the way for more adaptive, robust multimodal video understanding.

modality, proceedings, video, (15 more...)

arXiv.org Artificial Intelligence

2403.19221

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Tang, Yunlong, Zhang, Jinrui, Wang, Xiangchen, Wang, Teng, Zheng, Feng

arXiv.org Artificial IntelligenceJun-17-2023

Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (GEBC) competition is detailed in this paper. Unlike conventional video captioning tasks, GEBC demands that the captioning model possess an understanding of immediate changes in status around the designated video boundary, making it a difficult task. This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality. (2) To adapt the model to the GEBC task, we take the video Q-former as an adapter and train it with the frozen visual feature extractors and LLM. Our proposed method achieved a 76.14 score on the test set and won the first place in the challenge. Our code is available at https://github.com/zjr2000/LLMVA-GEBC .

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.10354

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Large language models can segment narrative events similarly to humans

Michelmann, Sebastian, Kumar, Manoj, Norman, Kenneth A., Toneva, Mariya

arXiv.org Artificial IntelligenceJan-24-2023

Humans perceive discrete events such as "restaurant visits" and "train rides" in their continuous experience. One important prerequisite for studying human event perception is the ability of researchers to quantify when one event ends and another begins. Typically, this information is derived by aggregating behavioral annotations from several observers. Here we present an alternative computational approach where event boundaries are derived using a large language model, GPT-3, instead of using human annotations. We demonstrate that GPT-3 can segment continuous narrative text into events. GPT-3-annotated events are significantly correlated with human event annotations. Furthermore, these GPT-derived annotations achieve a good approximation of the "consensus" solution (obtained by averaging across human annotations); the boundaries identified by GPT-3 are closer to the consensus, on average, than boundaries identified by individual human annotators. This finding suggests that GPT-3 provides a feasible solution for automated event annotations, and it demonstrates a further parallel between human cognition and prediction in large language models. In the future, GPT-3 may thereby help to elucidate the principles underlying human event perception.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2301.10297

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > New York > Bronx County > New York City (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.96)

Industry:

Media (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predictive Event Segmentation and Representation with Neural Networks: A Self-Supervised Model Assessed by Psychological Experiments

Basgol, Hamit, Ayhan, Inci, Ugur, Emre

arXiv.org Artificial IntelligenceOct-4-2022

People segment complex, ever-changing and continuous experience into basic, stable and discrete spatio-temporal experience units, called events. Event segmentation literature investigates the mechanisms that allow people to extract events. Event segmentation theory points out that people predict ongoing activities and observe prediction error signals to find event boundaries that keep events apart. In this study, we investigated the mechanism giving rise to this ability by a computational model and accompanying psychological experiments. Inspired from event segmentation theory and predictive processing, we introduced a self-supervised model of event segmentation. This model consists of neural networks that predict the sensory signal in the next time-step to represent different events, and a cognitive model that regulates these networks on the basis of their prediction errors. In order to verify the ability of our model in segmenting events, learning them during passive observation, and representing them in its internal representational space, we prepared a video that depicts human behaviors represented by point-light displays. We compared event segmentation behaviors of participants and our model with this video in two hierarchical event segmentation levels. By using point-biserial correlation technique, we demonstrated that event segmentation decisions of our model correlated with the responses of participants. Moreover, by approximating representation space of participants by a similarity-based technique, we showed that our model formed a similar representation space with those of participants. The result suggests that our model that tracks the prediction error signals can produce human-like event boundaries and event representations. Finally, we discussed our contribution to the literature of event cognition and our understanding of how event segmentation is implemented in the brain.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.0571

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback